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An  approach  to  spectral  estimation  is  described  which 
involves  the  simultaneous  use  of  frequency,  time,  and 
quantile  domain  algorithms,  and  is  called  quantile  spectral 
analysis .  It  is  based  on  the  premise  that  while  the  specfrum 
is  a  non-parametric  concept,  its  estimation  cannot  be  a 
non-parametric  procedure  to  be  conducted  independently  of 
model  identification.  We  discuss:  the  goals  of  spectral 
analysis,  quantile  data  analysis,  identification  of  memory 
(no,  short,  long),  index  of  regular  variation  of  a  spectral 
density,  autoregressive  spectral  estimation,  and  ARMA  model 
identification  by  estimating  MA(C))  and  subset  regression. 

An  illustrative  example  is  given  of  quantile  spectral 
analysis . 
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1.  Introduction  to  a  theory  of  spectral  synthesis 

Statistical  Spectral  Analysis  appears  to  be  a  subject  of 
considerable  controversy  as  to  how  to  do  it  and  whether  to  do 
it.  In  many  fields  of  engineering  and  physical  sciences,  its 
importance  for  applications  is  well  recognized.  In  other  fields 
(notably  economics)  its  value  is  still  debated.  One  reason  for 
this  may  be  the  difficulty  of  analysis  of  time  series  with  trends 
or  very  slowly  decaying  correlations  or  very  low  frequency  cycles 
or  spectral  densities  with  very  large  dynamic  range.  A  single 
name  for  such  time  series  is  "long  memory"  time  series. 

This  paper  describes  an  approach  to  time  series  analysis 
which  attempts  to  use  simultaneously  diverse  domains  of  analysis, 
and  thus  to  meet  the  needs  of  all  the  possible  fields  of 
application  of  time  series  analysis.  It  also  aims  to  integrate 
spectral  and  correlation  methods  with  methods  for  long  memory 
and/or  long  tailed  time  series. 

The  correlation  function  and  spectrum  are  basic  non-parametric 
(or  functional)  parameters  used  to  model  and  data  analyze  time 
series.  Estimation  of  the  correlation  function  and  of  the 
spectrum  represent  two  of  the  basic  tools  used  for  descriptive 
data  summaries  of  observations  and  to  guess  parametric 
probability  models  to  fit  to  observations.  The  spectrum  is 
important  also  as  a  major  concept  in  terms  of  which  to  analyze 
the  effect  of  passing  random  processes  (representing  either 
signal  or  noise)  through  linear  (and,  to  some  extent,  non-linear) 
sys  terns . 
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Correlation  and  spectrum  are  examples  of  non-parametric 
signatures  of  parametric  models.  We  believe  that  such  signatures 
provide  key  (and  two-key)  methods  for  achieving  the  goals  of 
time  series  analysis  (and  statistical  data  analysis).  The  goals 
are  to  find:  "Theories  to  fit  (attest)  the  (statistical)  facts" 
and  "statistical  facts  to  fit  (test)  theories."  By  fitting 
theories  to  facts  one  means  either  statistical  models  (to 
describe  the  statistical  behavior  of  the  data)  or  scientific 
models  (to  explain  the  statistical  models  fitted  by  the  data) . 

By  statistical  facts  to  test  theories  one  means  the  estimation 
of  characteristics  of  non-parametric  statistical  models 
(significant  time  lags,  significant  frequencies,  and  memory); 
such  parameters  (estimated  non-parametrically)  represent 
descriptions  of  a  real  process  which  an  acceptable  (or 
parametric  model)  must  explain.  The  goals  of  time  series 
analysis  can  be  stated  simply:  seek  models  which  fit  curves  (or 
fit  samples) ,  where  fit  is  measured  by  the  degree  of  scientific 
insight  provided  into  underlying  physical  mechanisms. 

The  approach  to  spectral  estimation  described  in  this  paper 
involves  the  simultaneous  use  of  diverse  algorithms  for  time 
series  analysis  (it  could  be  called  spectral  synthesis) .  Our 
approach  is  based  on  a  premise  that  might  appear  paradoxical: 
while  the  spectrum  is  a  non-parametric  concept ,  its  estimation 
cannot  be  a  non-parametric  procedure  to  be  conducted 


independently  of  model  identification. 
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To  form  a  spectral  estimator  one  must  identify  the  memory 
type  of  the  time  series,  which  we  classify  into  one  of  3 
types : 

a.  No  memory  or  white  noise, 

b.  Short  memory  or  stationary  with  finite  spectral 
dynamic  range, 

c.  Long  memory. 

A  short  memory  time  series  is  modeled  parametrically  by 
the  invertible  filters  which  transforms  it  to  white  noise 
whose  type  (AR,  MA,  or  ARMA)  one  must  identify. 

A  long  memory  time  series  is  modeled  parametrically  by 
an  operator  which  transforms  it  to  a  short  memory  time  series; 
such  operators  are  non- invertible  filters  or  representations  as 
the  sum  of  a  long-memory  signal  and  a  short  memory  noise. 

The  goal  of  the  time  series  analyst  is  often  defined  to  be 
either  a  time  domain  model  or  a  spectral  analysis.  Our  approach 
maintains  that  the  two  domains  must  be  employed  simultaneously 
because  the  choice  of  final  answer  must  be  based  on  having  a 
satisfactory  interpretation  in  both  domains.  Additional  domains 
(involving  memory,  information,  and  quantiles)  are  utilized  in 
our  approach  to  time  series  model  identification,  especially 
new  diagnostic  measures  (or  model  signatures) ,  based  on 
"quantile  data  analysis"  of  spectral  density  and  correlation 
functions.  These  new  model  signatures  represent  an  application 
to  time  series  analysis  of  new  time-series  theoretic  methods  of 


statistical  data  analysis  of  probability  distributions  which 
we  call  Quantile  Data  Analysis  and  Functional  Statistical 
Inference  (abbreviated  FUN.STAT). 

The  FUN.STAT  approach  to  statistical  data  analysis  is 
based  on  isomorphisms  between  properties  of  spectral  density 
functions  and  density-quantile  functions.  One  of  the  rewards 
of  this  isomorphism  is  an  important  diagnostic  of  time  series 
memory  called  the  index  6  of  regular  variation  of  a  spectral 
density  at  frequency  w. 


5 


2.  How  to  define  the  spectrum 

As  the  goal  of  the  theory  of  spectral  analysis,  we  propose 
that  we  adopt  the  goal  stated  by  Wiener  (1930)  in  his  celebrated 
pioneering  paper  which  introduced  generalized  harmonic  analysis. 
Wiener  defined  the  goal  of  spectral  analysis  to  be:  to  improve, 
and  make  rigorous,  Schuster's  concept  of  the  periodogram  of  a 
sample.  We  consider  only  discrete  parameter  time  series  Y(t), 
t=0 ,  +1,  ....  A  sample  is  the  finite  (but  increasing)  number 
of  observations  Y(t),  t=l,2,...,T. 

To  detect  the  "hidden  periodicity"  uo  in  the  model 

(1)  Y(t)  =  A  cos  (i)Qt  +  B  sin  (jjQt  +  N(t)  , 

where  N(t)  is  white  noise  (a  sequence  of  independent 
identically  distributed  random  variables  with  finite  second 
(and  possibly  higher)  moments],  Schuster  (1898)  proposed 
calculating  the  function  S^(w)  ,  -0 . 5 <u><0 .5  [we  take  0<u)£l]  , 
defined  by 

T 

(2)  St(uj)  -  £  [  £  Y(t)  exp  (- 2rrituj)  2 

which  we  call  the  sample  unnormalized  spectral  density  or 
periodogram  or  sample  power  spectrum. 

When  the  time  series  obeys  the  model  (1),  one  can  show  that 
St(oj0)  tends  to  «  as  T  tends  to  «.  Therefore  one  might  interpret 
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local  maxima  of  S,j.(u>)  as  indicating  "significant  frequencies" 
representing  "hidden  periodicities."  However  the  graph  of 
St(cj)  is  often  a  very  wiggly  function,  and  one  obtains  many 
bumps  in  the  spectrum  representing  "spurious  periodicities." 

A  traditional  approach  in  statistical  communication 
engineering  textbooks  to  defining  the  spectrum  S(w)  of  a  time 
series  Y(t),  t=0,  +1,...  has  been 

(3)  S(u>)  =  lim  S_  (oj) 

Unfortunately  for  those  who  would  like  the  world  to  be 
simple  [but  fortunately  for  those  who  enjoy  the  deeper  beauty 
of  a  more  complicated  reality  and  the  accompanying  theory]  the 
limit  of  ST(w)  does  not  exist  in  any  usual  mode  of  convergence. 
The  story  of  spectral  analysis  starts  with  the  study  of  the 
limiting  behavior  of  ST(u>)  ,  -0.5<u><0.5,  especially  how  to  use 
it  to  define  statistically  significant  signatures  of  time  series 
samples.  To  solve  the  problem  of  interpreting  S^,(u)  ,  Wiener 
(1930)  proposed  a  "radical  recasting"  based  on  the  Fourier 
representations 

T 

ST(a))  -  l  exp  (-27rivu>)  R^Cv)  , 
v=-T 

0.5 

/  exp  (2ivivu>)  S_,(w)  du> 

-0.5  T 


(4) 


where  R^Cv)  is  the  sample  covariance  function 


^(v)  =  £  I  Y (t+v)  Y(t)  .  v=0, 1 . T-l 

(5) 

=  0  .  v>T, 

=  KpC-v) ,  ,  v<0 

The  sample  correlation  function  is  defined  by 

(6)  px(v)  =  RjCv)  v  1^(0) 

Wiener's  theory  of  generalized  harmonic  analysis  is  based 
on  assuming  the  existence  of  the  limit 


(7)  p(v)  =  lim  pT(v) 

T+oo  A 

One  calls  p(v)  the  (asymptotic)  correlation  function. 

One  can  show  that  p(v)  is  a  function  of  non-negative  type: 
for  any  set  of  complex  numbers  c^,...,c  and  integers  vi*--->vn 

(8)  I  ctc  *  p (vi- v. )  >  0 

i.j  =  l  J  J 

where  c*j  denotes  the  complex  conjugate  of  c^ .  Consequently, 
there  exists  a  bounded  non-decreasing  function  of  a  real  variable 
go,  denoted  F(uj),  and  called  the  spectral  distribution  function. 
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such  that 

0.5  ~  . 

(9)  p(v)  =  /  e27I1VuldF(u>)  ,  v=0,  +1 . 

-0.5 

The  spectral  density  function  f(to)  is  defined  as  the  derivative 
of  the  absolutely  continuous  part  of  F(oj).  If  F(uj)  is  itself 
absolutely  continuous,  then 

p(v)  =  e^v^vulf  (oj)  d  to  . 

(10)  f(w)  =  l  e“27livojp(v) 

V=-  00 

A  sufficient  condition  for  the  spectral  density  f(ui)  to  exist  is 
that  the  correlation  function  p(v)  tend  to  0  (as  v-*-°°) 
sufficiently  fast  that 

(11)  l  |p(v)|  or  l  |  p  (v)  | 2  <  °°  . 

y=r-  oo  V=  ~  00 

A  probability  model  under  which  the  asymptotic  correlation 
function  p(v)  may  be  proved  to  exist  is  the  following: 

Y(t),  t=0,  +1,  ...  is  a  zero  mean  covariance  stationary  time 
series  with  covariance  function 


(12)  R(v)  =  E [ Y (t+v)  Y(t)J 
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and  correlation  function  p(v)  =  R(v)  *  R(0) .  When  the  time 
series  is  Gaussian,  a  necessary  and  sufficient  condition  for 
PT(v)  to  converge  (almost  surely)  to  p(v)  is  that 

^  T 

(13)  lim  m  l  p2(v)  =0. 

T->oo  L  V=1 

This  is  another  example  of  a  condition  on  p(v)  which  reflects 
the  rate  of  decay  to  zero  of  p(v). 

A  necessary  and  sufficient  condition  for  a  zero  mean  Gaussian 
covariance  stationary  time  series  to  be  white  noise  is  that 
p(v)  =  0  for  v  f  0 .  It  is  natural  to  conclude  that  one  can 
distinguish  three  types  of  time  series: 

,  T 

T  1  P2  (v)  =  0  for  all  T 
1  T 

t  l  P2  (v)  0  as  T  +  “> 

1  v=l 

1  T 

7  l  P2(v)  i  0 
1  v=l 

One  can  argue  that:  (1)  an  optimal  non-parametric 

estimator  p(v)  of  p(v)  is  given  by  p(v)  =  pt(v);  and  (2)  one 

can  base  a  test  of  the  hypothesis  H  of  white  noise  on  a 

o 

suitable  number  of  sample  correlations  p(v),  v=l,2,...,n.  Under 
Hq,  the  asymptotic  distribution  of  pT(v)  is  independent  Gaussian 
random  variables  with  zero  means  and  variance  T-^.  To  test  H 


no  memory 


short  memory 


long  memory 
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one  can  test  whether  p(v),  v-l,...,n,  satisfy  the  hypothesis 
that  they  are  a  random  sample  from  a  normal  distribution  with 
mean  0  and  variance  1/T.  This  is  the  kind  of  hypothesis  for 
which  FUN.STAT  provides  tools. 

A  quick  and  dirty  diagnostic  of  time  series  memory  type 
is  provided  by  the  value  of  the  correlations  mean  square 

CORRMS  =  J  I  P2(v) 
n  i 
V=1 

Computation  of  CORRMS  for  samples  of  size  200  indicate  that  very 
approximately  0.004  <_  CORRMS  <  0.1  indicates  short  memory  (then 
CORRMS  _<  0.004,  no  memory;  CORRMS  .1,  long  memory). 

The  sample  spectral  density 

T 

f(iu)  =  £  exp  (-2itivw)  p(v) 

v=-T 

is  computed  by  first  computing  the  sample  Fourier  transform 
T 

tj>(w)  =  l  Y (t)  exp  ( - 2 nitut > 
t=l 

at  an  equi-spaced  grid  of  frequencies  in  0<uj<1  of  the  form 
w  =  k/S,  k=0,l,...,S  -  1.  We  call  S  the  spectral  computation 
number;  one  should  choose  S  ^  T  +  M,  where  M  is  the  maximum 

A 

lag  at  which  one  computes  sample  correlations  p (v) . 

The  sample  spectral  density  f(u>),  O^uKl,  is  computed  at 
id  =  k/S  by  squaring  and  normalizing  the  sample  Fourier  transform: 
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f(»)  -  I*(U)I2  *  i  Sj1  licfel2 

k=0  5 

When  a  time  series  is  no  memory  (or  white  noise) ,  and  has 
finite  second  moments,  it  is  a  basic  theorem  of  time  series 

analysis  that  asymptotically  the  random  variables  f(k/S),  k=l . 

[S/2],  are  identically  distributed  as  an  exponential  distribution 
with  mean  1.  Therefore  tests  for  white  noise  can  be  obtained  by 
quantile  data  analysis  based  tests  for  exponentiality  of  the 
sample  spectral  density  f(u))  at  suitable  frequencies  (if  the 
informative  quantile  function  of  the  original  data  does  not 
indicate  a  long  tailed  distribution) . 

Powerful  discriminators  of  memory  type  are  SPECMED  (the 
median)  and  SPECVAR  (the  variance)  of  the  data  batch  of  values 
of  the  sample  spectral  density.  For  no  memory  these  are  the 
median  and  variance  of  an  exponential  distribution  with  mean  1: 

SPECMED  =  log  2  =  .69,  SPECVAR  =  1. 

Computation  of  these  signatures  for  samples  of  size  200  indicate 
that  short  memory  corresponds  very  approximately  to 

.1  <  SPECMED  <  .6,  1.5  <  SPECVAR  <  20 

Deeper  diagnostics  of  time  series  memory  are  provided  by 
quantile  data  analysis  of  f(ui),  especially  plots  of  informative 
quantile  function  IQ(u)  and  comparison  distribution  functions 
D(u)  . 


The  insights  into  model  identification  provided  by  the 
notion  of  memory  are  captured  not  by  definitions  in  terms  of 
correlations  (or  even  partial  correlations)  but  by  definitions 
in  terms  of  the  dynamic  range  of  the  spectral  density  function 
and  sample  spectral  density,  defined 

SPECRNG  =  {max  log  f(u)  -  min  log  f(w)} 

U)  CO 

Dynamic  range  classification  of  memory  of  a  time  series: 
no  memory  =  dynamic  range  =  0  , 

short  memory  =  0  <  dynamic  range  <  <»  , 

long  memory  =  dynamic  range  =  00 

When  the  dynamic  range  is  finite,  we  can  assume  that  the 
spectral  density  f(io)  is  bounded  above  and  below:  for  some 
constants  c-^  and  C2,  0<c^<_f (w)<C2<°°  for  all  to. 

The  operations  which  transform  a  long  memory  time  series 
to  a  short  memory  one  (or  which  represent  a  long  memory  time 
series  in  terms  of  a  short  memory  one)  can  be  considered  a 
parametric  time  domain  model.  Nonparametric  descriptions  of 
long  memory  properties  can  be  defined  in  terms  of  the  index  of 
regular  variation  of  the  spectral  density  at  a  specified 
frequency,  usually  zero  frequency.  As  w+0,  the  spectral  density 
f(u>)  is  assumed  to  be  a  regularly  varying  function,  with  the 
representation 
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f  (cd)  =  Id  ^L((d) 

where  L(td)  is  a  slowly  varying  function.  The  value  of  6  is  an 
index  of  length  of  memory,  since 

No  and  short  memory  =  6=0 

Long  memory  =6/0 

Long  memory  time  series  models  have  spectral  density  f  (td) 
satisfying  the  regular  variation  representation.  The  index 
6<0  (6>0)  corresponds  to  a  zero  (infinite)  value  for  f(td)  at 

(d=0  . 

Determining  the  degree  of  differencing:  When  a  time  series 
Y (t)  can  be  transformed  to  a  stationary  time  series  Z(t)  by 
differencing  d  times,  one  can  think  of  the  "spectral  density" 
f^Ccd)  of  Y(-)  as  having  the  representation 

fY(<d)  =  |l-e  I  f  z((d) 

which  is  a  special  case  of  assuming  that  fy(w)  is  regularly 
varying  at  td=0  with  index  6=2d.  Estimators  for  6  can  provide 
techniques  for  estimating  d. 

Self- similarity:  When  a  spectral  density  f(td)  is  regularly 

varying  at  oj=0  it  enjoys  a  property  called  approximate  self- 

similarity:  f(ytd)  =  y"  f (co)  in  the  sense  that,  for  any  y>0, 

_  ^ 

f(ytd)  *  y  f  (td)  -+  1  as  (d+0. 
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3.  Quantile  Data  Analysis 

The  probability  distribution  of  a  random  variable  X  has 
been  traditionally  described  by  its  distribution  function 
F(x)  =  Pr[X£x]  and  its  probability  density  function  f(x)  =  F'(x). 
Given  a  sample  X^, . . . ,X  of  X,  the  models  that  we  seek  to  fit 
to  the  data  are  usually  parametric  models  of  the  form 

F(x)  -  F0(S^) 


for  parameters  u  and  a  (representing  location  and  scale 

respectively)  to  be  estimated,  and  F  (x)  a  known  distribution 

o 

function.  The  most  important  cases  of  Fq(x)  are: 

x 

normal  Fq(x)  =  *00  =  /  4>(y)  dy 

—  oo 

4> (y)  =  exp  -  2  yZ  ; 

exponential  Fq(x)  =  1  -  e  x  ,x>0 

The  quantile  function  Q(u),  0<u<l,  defined  by 

Q(u)  =  F  ^(u)  =  inf  {x:  F(x)  _>  u)  , 

can  be  regarded  as  inverse  of  the  distribution  function.  When 
F(x)  is  continuous,  FQ(u)  =  u.  Quantile  data  analysis  estimates 


non-parametrically  Q(u) ;  the  quantile- density  function 
q(u)  =  Q' (u) ;  and  the  density- quant lie  function 

fQ(u)  =  f (F  1(u)  =  {q(u)}"1 

The  tail  behavior  of  a  distribution  can  be  described  by 
indices  aQ  and  in  the  following  representations  of  the 
density-quantile  function  as  a  regularly  varying  function: 

fQ(u)  =  u  °  Lq(u),  fQ(u)  =  (1-u)  1  L1(u) 

where  (u)  is  a  slowly  varying  function  as  u+j . 

The  definition  of  a  function  L(u)  slowly  varying  at  u=0 
is:  for  every  y,  in  0<y<l, 

log  L(yu)  -  log  L(u)  -*•  0  as  u  -*■  0 

We  assume  in  addition  that 

iio  ^o  {log  L(yu)  '  log  u>  dy  =  0 

Then  one  can  compute  aQ  by 

-«0  ■  iio  ^ o  {log  fQ<yu>  "  log  fQ<u>>  dy- 
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We  digress  to  note  that  for  a  spectral  density  function 
f  (to) ,  0<uj<1  one  assumes  a  representation  (at  oi=0) 

f  (to)  =  uf  6  L(u)) 

We  call  6  the  index  of  regular  variation  at  frequency  oj=0. 

It  is  computed  by 

6  =  iio  fl  {1°s  f<y^  -  los  dy 

Estimation  of  the  density-quantile  function  fQ(u)  can  be 
treated  by  similar  procedures  as  are  used  to  estimate  spectral 
densities  f(ui).  However  much  insight  into  the  distributions 
that  might  fit  a  sample  can  be  obtained  by  non-parametrically 
estimating  the  quantile  function  and  the  amazingly  useful 
informative  quantile  function 

Q(u)  -y-, 
lQ(u)  -  -5r- i 

where  u-^  and  are  universal  estimators  of  location  and  scale 
respectively.  We  propose  =  0(0.5),  =  Q'(0.5)  =  q(0.5). 

The  IQ  function  is  plotted  with  a  vertical  scale  from  -1 
to  1;  its  values  are  truncated  when  they  exceed  +1.  For  ease 
of  interpretation  of  the  IQ  function,  we  also  plot  the  IQ 
function  of  the  uniform  distribution  which  is  a  straight  line 


passing  through  (0,  -.5)  and  (1,  .5).  A  plot  of  IQ(u)  is 
accompanied  by  its  values  at  u  =  0.01,  0.05,  0.10,  0.25,  0.75, 
0.90,  0.95,  0.99. 

Natural  quick  and  dirty  estimators  of  are 

op  =  (Q(0 . 5  +  p)  -  Q (0 . 5  -  p)}  i  2p  ; 

where  0£p<0.5;  our  preferred  estimator  is  Oq  ^5  which  equals 
twice  the  interquartile  range: 

o  25  =  2  {Q(0 . 75)  -  Q(0 . 25> } 

To  estimate  these  non-parametric  parameters  from  a  sample 
of  size  n,  we  form  a  sample  quantile  function  Q(u)  by  linear 
interpolation  of  the  values 


where  X,  < . . . <X  are  the  order  statistics  of  the  sanrole. 
in—  —  nn 

When  the  sample  mean  Y  is  large,  it  is  necessary  to  trans¬ 
form  Y(t)  to  Y(t)  -  Y;  otherwise  one  would  always  obtain  a 
diagnostic  that  Y(*)  is  a  long  memory  time  series.  An  alternative 
first  step  in  time  series  analysis  is  to  replace  Y(t)  by 
(Y(t)  -  Q (0 . 5) }  v  2{Q(0 . 75)  -  Q(0.25)} 
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One  can  test  (before  parameter  estimation)  the  goodness  of 
fit  of  a  sample  to  F(x)  *  Fo for  some  y  and  o  by  introducing 
the  weighted  spacings 

d(u)  -  i-  f  0  (u)  q(u) 
o0  °  ° 

where:  ^Q^o^u^  =  fQ(F  ^.(u))  is  the  density-quantile . function 

of  the  specified  distribution;  q(u)  =  Q' (u)  is  the  sample 
quantile  density  function  (expressible  in  terms  of  spacings, 
or  differences  of  successive  order  statistics) ;  and 

°o  =  /o  foQo(u)  <l(u)  du 
The  test  function  is 

D(u)  =  d(t)  dt,  0<u<l, 

which  one  compares  with  the  uniform  distribution  D(u)  *  u, 

CKu<l.  We  call  D(u)  the  sample  comparison  distribution  function, 

or  the  cumulative  weighted  spacings  function  [Parzen  (1979)]. 

~  k 

The  data  batch  f(^),  k=0 ,  l,...,S/2,  is  tested  for 
exponentiality  by  forming  its  informative  quantile  function  IQ(u) 
and  its  cumulative  weighted  spacings  function  D(u) ,  with 
foQ0(u)  ~  1-u.  How  one  interprets  the  quantile  data  analysis  of 
the  sample  spectral  density  is  best  illustrated  by  examples. 


4.  Autoregressive  spectral  estimation 


The  concept  of  an  autoregressive  representation  of  a  time 
series  can  be  defined  from  several  points  of  view.  When  one's 
goal  is  spectral  estimation  based  on  improving  Schuster's 
definition  of  the  spectrum,  it  seems  natural  to  adopt  the 
viewpoint  of  a  deconvolution  filter  representation: 

I  a  (j)  Y(t-j)  «  c(t)  ,  t-0,  +1,  ... 

j-0  P 

One  seeks  to  choose  the  order  p  and  coefficients  a  (j)  so  that 
the  time  series  e(t)  is  parsimoniously  white  noise  (just  barely 
passes  tests  for  white  noise).  From  the  deconvolution  filter 
representation  of  Y(t)  one  obtains  approximately  the  following 
formula  for  sample  spectral  density  functions: 


fY(u)> 


j-o 


ap(j) 


exp  (2TTiu)j) 


=  o2 


(to) 


where 

Op  =  l  e 2 (t)  *  l  Y2 (t) 
p  t=l  t=l 


It  is  important  to  note  that  a  near  zero  value  of  is 
regarded  as  evidence  that  memory  type  is  long. 

When  fG(to)  behaves  as  the  sample  spectral  density  of  white 
noise,  we  can  estimate  it  by  the  constant  1.  As  potential 


estimates  of  the  spectral  density  fy(w)  of  the  time  series  Y(t) 
we  consider  the  sequence  of  autoregressive  spectral  estimators 


f  (w) 
P 


=  °D  f  ^(j)  eXP 

P  j=0  p 


for  p=0,l,2,...  with  coefficients  3p(j)  computed  by  suitable 

A 

algorithms.  The  sequence  f  (u>) ,  p=0,l,2,...,  should  be 
regarded  as  a  sequence  of  functions  which  smooth  the  sample 
spectral  density.  They  become  increasingly  wiggly  as  p  tends 
to  T,  and  eventually  coincide  with  fyCoj). 

To  estimate  the  autoregressive  coefficients  a^Q), 

p=l,2,...  and  j=l,2 . p,  two  main  methods  are  available  of 

which  important  representatives  are  algorithms  called 

1.  Yule-Walker  (stationary) 

2.  Burg  (non-stationary) 

The  optimality  properties  of  the  methods  depends  on  the 
memory  type  of  the  time  series.  Theoretical  and  empirical 
evidence  indicate  that  Yule-Walker  and  Burg  estimators  agree 
for  short  memory  time  series  (which  can  be  shown  to  be  always 
representable  as  an  invertible  infinite  autoregression) .  For 
long  memory  time  series  which  one  assumes  to  possess  an 
autoregressive  representation,  Yule-Walker  and  Burg  estimators 
usually  differ  significantly;  further  Burg  (and  least  squares) 
estimators  are  consistent  estimators  while  Yule-Walker  are  not. 

We  propose  the  following  practical  consequences  of  these 
facts.-  given  a  time  series  sample,  compute  autoregressive 
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coefficients  by  both  Yule-Walker  and  Burg  estimators  (for 
orders  p  to  be  determined  from  the  data  as  described  below) . 

Check  whether  the  two  ways  of  estimation  yield  similar  results. 

A  yes  answer  is  evidence  that  the  memory  type  is  short,  or  if 
memory  is  long  that  an  autoregressive  model  may  fit.  If 
memory  is  short,  identify  an  ARMA  scheme  by  the  procedures  of 
the  next  section.  A  no  answer  is  evidence  that  the  memory  type 
is  long;  using  the  diverse  signatures  introduced,  one  should 
determine  operations  which  just  barely  transform  the  long 
memory  time  series  to  a  short  memory  one  (especially  a  model  as 
a  sum  of  long  memory  signal  plus  short  memory  noise) . 

Solutions  to  the  important  problem  of  determining  the 
order  of  approximating  autoregressive  schemes  can  be  approached 
using  order  determing  criterion  functions,  especially  AIC  (due 
to  Akaike)  and  CAT  (due  to  Parzen) ,  which  are  formed  from  the 

A 

sequence  o^,  p=l,2 .  We  recommend  that  one  determine  two 

orders  (called  best  and  second  best)  rather  than  a  single  order. 

The  best  (second  best)  order  is  that  at  which  the  criterion 
function  achieves  its  minimum  (second  lowest  relative  minimum) . 

/v 

The  maximum  of  these  orders  (denoted  p)  is  used  as  the  order  for 
least  squares  (or  Burg)  estimation  of  autoregressive  coefficients. 
An  order  determining  criterion  should  be  used  only  to  suggest 

A 

autoregressive  orders  p  for  which  f  (uj)  is  a  satisfactory 
spectral  estimator.  The  ultimate  criterion  of  goodness  of  fit 
of  an  autoregressive  spectral  estimator  is  that 
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dp(0))  =  fY(W)  r  f  (W) 

just  barely  satisfies  the  hypothesis  that  it  is  the  sample 
spectral  density  of  white  noise.  The  notation  d(io)  is  chosen 
to  convey  that  this  function  has  properties  similar  to  that 
of  d(u)  introduced  in  the  preceding  section  for  testing 
goodness  of  fit  of  probability  distributions. 

While  autoregressive  spectral  estimation  can  be  performed 
for  long  memory  time  series  obeying  an  autoregressive  scheme 
with  roots  on  or  very  near  the  unit  circle,  it  seems  to  me 
that  the  process  of  model  identification  of  observed  real 
time  series  has  greatest  scientific  insight  if  it  is  carried 
out  in  two  stages  in  which  one  first  finds  an  autoregressive 
operator  (with  a  simple  interpretation)  which  just  barely 
transforms  the  time  series  to  short  memory,  which  in  turn  is 
modeled  by  an  ARMA  whitening  filter. 
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5.  ARMA  Model  Identification  by  Estimating  MA(°°)  and  Subset 
Regression 

A  variety  of  approaches  have  been  proposed  by  statisticians 
for  identifying  the  orders  p  and  q  and  estimating  the  coefficients 
of  an  ARMA  (p,q)  model  for  an  observed  time  series.  An  approach 
that  we  use  (because  of  its  very  low  computational  cost) 
for  an  initial  identification  is  based  on  first  estimating  the 
coefficients  of  the  MA(°°)  representation: 

Y(t)  =  e  (t)  +  b^l)  e(t-l)  +  .  .  . 

with  residual  variance 

o’  =  E[  e 2 (t) ]  i  E[ Y2 (t) ] 

One  approach  to  estimating  MA(°°)  is  to  estimate  the  AR(°°) 
representation  by  an  approximating  AR(p)  representation  (we 
usually  use  the  Burg  algorithm  to  compute  its  coefficients). 

Then  one  solves  recursively  for  boo(j)  by 

*p<0)  b00(k)  +  ap(l)  bco(k- 1)  +  .  .  .  +ap (k)  b0O(0)  -  0,  k=l,2 . 

A  second  approach  to  estimating  MA(®)  is  to  compute  the 
cepstral  pseudo-correlations 

*<v)  =  I]  exp  (27Tiaiv)  log  f (ai)  dw  .  v=0,+l . 
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One  computes  \ Kv)  by  replacing  f(w)  by  a  windowed  spectral 
density  estimator 

T 

f(w)  -  l  exp  (-27Tivii))  k(n)  p  (v) 
v=»-T  n 

for  a  suitable  kernel  k(x)  and  truncation  point  M  (satisfying 
T/2  <  M  <  T). 

From  ip ( v)  one  can  compute  b^Cn)  by  the  recursive  formula 

(n+1)  bM(n+l)  -  l  (k+l)*(k+l)  bro(n-k) 
k=0 

The  spectral  formula  for  residual  variance  , 

log  =  /*  log  f(w)  dw  «  tp(0) 

A  A 

yields  an  estimator  oj,  when  one  replaces  f(u>)  by  f(u>). 

In  the  population  a  key  relation  is 

a2  <1  +  b  (1)  +  b  (2)  K .  .  }  =  1 

CO  CO  oo 

An  alternative  estimator  of  o, i  is  therefore 

OO 

a2  =*  (1  +  b2  (1)  +  b 2  (2)  +  .  .  . }  '1 

oo  OO  X  '  00  '  ' 

A  useful  signature  of  the  memory  and  ARMA  types  of  a  time  series 
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is  the  prediction  variance  horizon  function 

PVH(h)  -  o*  {1  +  b*(l)+...+b*(h-l)}  ,  h-1,2 . 

It  can  be  interpreted  as  representing  the  mean  square  error  of 
prediction  h  steps  ahead.  The  horizon  of  a  time  series  is 
defined  to  be  smallest  value  of  h  for  which  PVH(h)  is  greater 
than  a  suitable  value  (such  as  0.95). 

From  the  MA(»)  representation  one  forms  an  estimator 

p(v)  -  o*  (bM(v)  +  b^l)  boo(v+l)  +  .  .  .  } 

of  p  (v)  ,  and  an  estimator  b^Ck)  of  the  covariance  between 
Y(t)  and  c(t-k);  we  assume  Y(t)  has  been  normalized  to  have 
variance  1. 

Next  one  forms  the  joint  covariance  matrix  of  Y(t), 

Y (t- 1) , . . . , Y(t-m)  ,  e (t- 1) , . . . , e (t-m)  for  a  suitable  lag  m. 
Finally,  a  subset  regression  routine  is  used  to  determine  an 
ARMA  model 

t 

ap(0)  Y (t)  +  ap(l)  Y(t-1)  +  .  .  .+ap(p)  Y(t-p) 

=  b (0)  e (t)  +  bq (1)  e(t-l)+. . ,+b  (q)  e(t-q) 

with  as  many  zero  coefficients  as  possible  [note:  ap(0)  = 
b^(0)  =  1).  These  models,  called  subset  regression  ARMA  models, 


yield  ARMA  spectral  estimates 


,  lha<e2”1“>|2 

°p.q  |^(e2iriU)|i! 


where 


gp(z)  =  ap(0)  +  ap(l)z+. . .+ap(p)  zp, 
h^(z)  =  bq(0(  +  bq(l)z+. . .+b^(q)  zq 

For  a  monthly  economic  time  series,  with  short  memory,  one 
often  finds  p=2 ,  q=12,  with  b12(12)  the  only  non-zero  moving 
average  coefficient.  The  transfer  function  g2(z)  models  the  low 
frequency  component,  and  h^2(z)  models  the  seasonal  component. 

We  use  the  notation  ARMA(1,2;12)  for  this  model.  We  use 
AR(1,12,13)  for  a  subset  ARMA  model  with  p=13,  q=0,  and  al3U)  , 
a-j^(12)  ,  a^(13)  the  only  non-zero  coefficients. 
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6.  An  example  of  quantile  spectral  analysis 

To  illustrate  the  quantile  approach  to  spectral  estimation, 
let  us  consider  New  York  City  monthly  average  temperatures 
1946-1959  (such  a  series  might  be  collected  jointly  with  New 
York  City  monthly  births  1947-1960  to  investigate  if  there  is 
a  relationship  between  atmospheric  temperature  and  birth  rate) . 
One  suspects  a  seasonal  period  of  12  (equal  to  oj  =  .08333). 

Original  data  signatures:  Mean  54.6,  median  55.1 
Standard  deviation  of  the  informative  quantile  function  is  .2648 
with  log  -1.33;  this  diagnostic  measure  is  -1  for  Gaussian  time 
series.  The  values  IQ(0.01)  =  -.48  and  IQ(0.99)  =  .42  provide 
decisive  evidence  that  the  distribution  is  not  Gaussian,  but  is 
short  tail  [which  in  the  case  of  time  series  represents  a 
harbinger  of  a  sine  wave  plus  noise  model]. 

Sample  Spectral  Density  f:  Median  .06,  variance  50  are 
strong  evidence  that  the  time  series  is  long  memory.  Quantile 
density  q(u)  of  f(uj)  has  maximum  value  30892;  extreme  values  of 
quantile  Q(u)  of  f  are  25  and  79  [such  large  values  indicate  the 
presence  of  a  very  narrow  band  signal].  The  graph  of  D(u) 
confirms  this  conclusion. 

Correlations .  Mean  square  .26  is  strong  evidence  that 
time  series  is  long  memory  with  sine  wave  components. 

AR  order  determination.  As  usual,  the  same  best  and  second 
best  AR  orders  are  reported  by  AIC  and  CAT.  The  orders  are  9 

A 

and  7,  with  o£  equal  to  .097  and  .099  respectively. 

Delta  (index  of  regular  variation)  at  u>=0,  0.08333: 
Autoregressive  and  kernel  spectral  density  estimators  both 


indicate  6=0  at  u^O  and  6=2  at  uf=0. 08333. 


Comparison  of  Yule~Walker  and  Burg  estimators  of 
autoregressive  coefficients  and  partial  correlations.  If  the 
estimators  are  significantly  different,  we  would  conclude  that 
the  time  series  is  long  memory  and  an  ARMA  model  is  not 
applicable . 


Autoregressive  coefficients  Partial  Correlations 


Index 

Yule-Walker 

Burg 

Yule-Walker 

*urg 

1 

-.553 

-.388 

.818 

.825 

2 

.020 

-  .093 

-.634 

-  .661 

3 

.044 

.023 

-.497 

-.576 

4 

.162 

.117 

-.454 

-.491 

5 

.179 

.222 

-  .332 

-.399 

6 

.095 

.129 

-.192 

-.221 

7 

.120 

.170 

-.153 

-.175 

8 

.135 

.144 

-.047 

-.048 

9 

-.162 

-  .255 

.162 

.255 

Our  current  experience  leads  us  to  believe  that  the  above 
estimators  are  just  barely  "significantly  different.”  However 
the  spectral  densities  seem  to  yield  similar  results. 

Comparison  of  spectral  estimators.  The  Burg  AR  spectral 
estimator  is  strongly  peaked  with  peak  at  to=0.0833  (period  12). 
The  spectral  distribution  rises  form  0.03  to  0.96  over  the 
interval  (.076,  .090)  corresponding  to  periods  (13.09,  11.08). 

The  sample  spectral  distribution  rises  from  .05  to  .92,  while  the 
Yule-Walker  autoregressive  spectral  distribution  rises  from  .06 
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ARMA  Subset  Regression  Based  on  Estimating  MA(°°)  .  Our 
algorithm  yields  the  canonical  models  ARMA(1,2;12)  and 
AR(1,12,13).  That  an  ARMA  model  should  not  be  fit  to  the  time 
series  of  NYC  monthly  temperatures  is  indicated  by  the  lack  of 
fit  of  the  ARMA  spectral  distributions  to  the  sample  spectral 
distribution,  since  the  former  rise  from  .16  to  .85  for 
cepstral-based  MAC00),  arid  from  .15  to  .81  for  Burg-based  MA(°°)  , 
over  the  frequency  interval  (0.076,  0.090).  We  have  not 
investigated  whether  the  ARMA  models  identified  would  fit 
better  if  their  parameters  were  estimated  more  efficiently  than 
they  are  by  our  subset  regression  algorithm. 

Conclusion.  A  model  for  Y(t)  which  has  maximum  insight 
is:  Y (t)  =  S(t)  +  Z(t),  where  S(t)  is  a  function  with  period 
12  [initially  estimated  by  the  monthly  means],  and  Z(t)  is  a 
stationary  time  series.  The  spectrum  of  interest  here  would 
seem  to  be  that  of  Z(t).  However  if  one  insists  on  a  spectral 
density  estimator  for  Y(t)  -  Y,  a  satisfactory  answer  may  be 
the  autoregressive  spectral  density  estimator  of  order  9  with 
coefficients  computed  by  a  Burg  (or  least  squares)  algorithm 
rather  than  by  the  Yule-Walker  equations.  Since  AR  order 
determining  criteria  do  not  apply  for  this  model,  the  question 
is  open  if  one  should  not  base  the  AR  spectral  estimator  on 
an  AR(13) . 


7.  Summary  of  quantile  spectral  analysis 

Given  a  sample  Y(t),  t**l,  2 ,  .  .  .  , T,  quantile  spectral  analysis 
first  forms  the  standardized  time  series  {Y(t)  -  median}  *  {twice 
interquartile  range}  for  which  one  computes  the  sample  spectral 
density  (periodogram) ,  sample  correlations,  sample  partial 
correlations,  sample  cepstral  pseudo-correlations  (and  even 
sample  inverse-correlations).  The  output  we  propose  that  one 
examine  to  identify  time  series  memory  and  spectral  density 
estimator  is  as  follows: 

(I)  IQ(u)  and  D(u)  plots  of  the  original  data  (to  identify 
its  probability  distribution) ,  sample  spectral  density,  and 
sample  correlations. 

(II)  order  determining  criterion  functions  AIC  and  CAT; 
Yule-Walker  estimators  of  autoregressive  coefficients  for  best 
and  second  best  AR  orders;  Burg  estimators  of  autoregressive 
coefficients  for  the  maximum  of  the  best  and  second  best  AR 
orders . 

(III)  Diverse  Spectral  density  (and  corresponding  Spectral 

distribution  function)  estimators  computed  by  the  following 
methods:  (a)  sample  spectral  density,  (b)  AR  spectral  density 

of  best  order  with  Yule-Walker  computed  coefficients;  (c)  AR 
spectral  density  with  least  squares  (Burg  algorithm)  coefficients 
(d)  ARMA  Spectral  density  estimators  with  coefficients  determined 

by  subset  regression,  based  on  an  MA(°°)  representation  computed  from 
an  approximating  AR  scheme,  (e)  ARMA  spectral  density  estimators 
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with  coefficients  determined  by  subset  regression,  based  on  an 
MA(°°)  representation  formed  from  sample  cepstral  pseudo¬ 
correlations.  Each  of  these  methods  also  yields  estimators  of  o^. 

(IV)  Each  estimated  spectral  density  is  used  to  compute 
estimators  6^  of  the  index  6  of  regular  variation  of  f(u>)  at 

io=0  and  a  specified  seasonal  frequency,  [a  formula  for  6^  is  given  below], 

(V)  An  estimated  spectral  density  is  formed  called  the 
local  quantile  spectral  estimator;  it  is  based  on  the  median 
and  quartiles  of  the  set  of  values  of  the  sample  spectral 
density  in  a  specified  neighborhood  of  an  equi-spaced  grid  of 
frequencies . 

The  approach  to  time  series  model  identification  outlined 
in  this  paper  can  be  considered  exploratory  data  analysis  since 
the  diverse  criterion  functions  utilized  require  no  theory  for 
interpretation  if  one  is  willing  to  base  one's  conclusions  on 
the  empirically  observed  values  of  the  criteria  for 
representative  time  series.  On  the  other  hand,  the  criteria 
are  based  on  clearly  stated  concepts  of  probability  theory, 
and  one  could  study  theoretically  the  distribution  of  the 
criteria  for  various  time  series  models.  The  ultimate  validity 
of  this  approach  (and  refinements  of  its  reasoning  process) 
can  only  be  accomplished  by  a  series  of  examples  of  important 
practical  applications. 

Among  the  important  questions  for  further  research  is  more 
theory  concerning  the  index  6  of  regular  variation  of  a 
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spectral  density  f(w)  at  a  frequency  <dq,  defined  by  the 
representation 

f(id)  =  ((j-u)0)"6  L(oj-ii)0) 

where  L(x)  is  slowly  varying  as  x  tends  to  0.  No  and  short 
memory  time  series  have  6=0  at  all  frequencies.  Long  memory 
time  series  have  6^0  at  some  frequency.  To  estimate  6  from  a 
consistent  (windowed  or  AR)  estimator  f (— )  of  the  spectral 
density  at  a  grid  of  equi- spaced  frequencies,  we  choose  m  so 
that  m/n  =  u)Q  and  form  a  sequence 

6k  =  ^  log  f(ij!p)  -  log  f(hil±5)  . 

One  conjectures  that  if  n  and  k  are  integers  tending  to  <*>  in 
such  a  way  that  k/n  tends  to  0,  then  6  =  lim  5^ 

A  value  of  6=2  indicates  a  sharp  peak  in  the  spectral  density, 
that  differencing  once  may  be  justified,  or  that  a  periodic  signal 
should  be  fitted.  A  value  of  6=- . 2  indicates  a  sharp  trough 
in  the  spectral  density  which  may  be  the  result  of  over¬ 
differencing.  The  convergence  of  6^  to  6  is  very  slow,  and  we 
currently  use  t-he  shape  of  the  curve  6^  rather  than  any  of  its 
individual  values  as  the  evidence  for  interpretation. 
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